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(54) Storing data of an XML<locument in a relational database 



(57) XML documents are used to put structured data 
Into a file. In some cases, e.g. when XML files are used 
for data exchange between database senders or when 
queries are to be nnade on XML documents, XML files 
have to be imported into a database. A method for inv- 
porting data from an XML document containing a piural- 
ity of elements and attributes into a relational database 
comprises the steps of: creating an element table (210) 
for storing data of the plurality of elements, creating an 
attribute table (220) for storing data of the plurality of 



attributes, storing, in the element table (210), an ele- 
ment data set containing an element ID for every one of 
the plurality of elements, storing, in the attribute table 
(220), an attribute data set for every one of the plurality 
of attributes, the attribute data set containing an at- 
tribute name and attribute value and the element ID of 
the element to which the attribute.ls assigned. The meth- 
od provides a fixed database model for different XML 
documents and allows a simple creation of a database, 
simple data import and export into/from the database. 
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Description 

flELD OF THE INVENTION 

5 [0001 ] The present invention relates to a method and an apparatus for storing data of an XML-document In a relational 
database and to the resultant data structure. 

DESCRIPTION OF THE RELATED ART 

10 [0002] Within a short time after its standardization the extended markup language XML has become increasingly 
popular among software developers in particular for world-wide-web applicattons. XML is on the way to become a 
worldwide standard for the creation of a structured web based document. 

[0003] XML can be regarded as a meta language for describing markup languages and provides facilities to define 
tags and structural relationsh^s between them. XML is a platfomi Independent set of rules for putting structured data 

IS Into a file. With XML it is fairly easy to separate the content data from the presentation or formatting information. 
[0004] XML-documents are increasingly used for the exchange of data between different database servers, for ex- 
ample In electronic commerce appUcaiions. In this case, when XML files are used for data exchange between two 
database servers or when queries are to be made on a large XML^documents these XML files have to be imported 
into a database. While some database management systems are so to say *XML enabled" there iscurrently no solution 

20 available to store XML-documents in any relational database system. 

(0005] Databases and XML offer complementary functionality for storing data. Databases store data for efficient 
retrieval, whereas XML offers an easy informatton exchange that enables Interoperability between applications. 
[0006] For converting the data of an XML-document into a database a database model has been proposed whfeh is 
basod on a structure of tho XML files as given In the document typo description (DTD). This database modol uses one 

25 database table for each element of the XML-document. The database model therefore depends on the specific XML- 
document, for example the number of elements. This approach has a number of drawbacks. The database creation is 
complex and time consuming since the DTD must be parsed in order to create the database model. The data Import 
from the XML-document is also quite slow since for each XML element a database table has to be created. 
[0007] There is therefore a need for a simple, fast and effk^lent method for transfening data from an XML-docunrtent 

30 into a relational database. 

SUMMARY OF THE INVENTION 

[0008] The present Invention provides a method of storing. In the fonm of a relatbnal database, data from a nnarkup 
35 document containing a plurality of elements and attributes, the method comprising steps of creating an element ^ble 
for storing data of the plurality of elements, creating an attribute table for storing data of the plurality of attributes, 
storing, in the elenrtent table, an element data set containing an element ID for every one of the plurality of elements, 
and storing, In the attribute table, an attribute data set for every one of the plurality of attributes, the attribute data set 
containing an attribute value and the dement ID of the element to which the attribute is assigned. 
40 [0009] The present invention uses a fixed database nrK>del for storing a data from the XML-document in the database. 
In this model one database table is created for storing the XML elements of the XML-document and a further table is 
created for storing the attributes including the attribute values of the XML-document. Fomnatting information and the 
like is not stored in dat^>ase; only the content Information is extracted from the XML-docunr\ent. The databasecreation 
is therefore greatly simplified since one database model once created can be used for all XML-docunnents. Also the 
45 data import Is simplified since a standard XML parser can be used to extract the elements end attributes. The retrieval 
of data from the database for creating a new XML-document te also simple since all necessary content Infonmation 
can be easily extracted from the two tables. 

[001 0] Preferably an element data set contains the character data contained in this XML elennent. It is, however, also 
possible to create an extra table for storing the character data. 
so [0011] In order to reflect the hierarchical structure of the elements of the XML-document, an element data set pref- 
erably contains the ID of a parent element of the XML element, if such parent element exists. 
[0012] In order to facilitate sorting operations orthe tike it is possible to assign an additional number to an element 
and to store this number in the respective element data set. 

[0013] The element data set may contain, besides the assigned element ID the element name from the XML-docu- 
55 ment. 

[0014] According to a particular embodiment of the Invention an additional element name table is created containing 
data sets for all elements of the XML-document having different names. To each name an element name ID is assigned 
and the element name data set contains the element name ID and the corresponding element name. The element data 
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set then has to contain only the element name ID instead of the complete element name. If an XML-document contains 
a large number of elements having the same name this embocRment can achieve a substantial reduction of the nec- 
essary memory space for the database. 

[0015] In a similar way according to further embodiment, an additional table is created containing every attn'bute 
5 name appearing in the XML-document and a con^esponding attribute name 10. It is then sufficient if the attribute table 
contains only the attribute name ID instead of the full attrft>ute name. 

[0016] The present invention furttier provides a data structure comprising an element table for storing a plurality of 
data sets corresponding to a plurality of elements of a markup document, an element data set containing an assigned 
element ID and element data, an attribute table for storing a plurality of attribute data sals corresponding to a plurality 
10 of attributes of a markup document, an attribute data set containing attribute data and the element ID of the element 
to whtoh the attribute is assigned. 

[001 7] The present invention still further provides a computer system comprising an input unit for inputting a markup 
document containing a plurality of elements and attributes, a processing unit for creating an element table for storing 
data of the plurality of elements and an attribute table for storing data of the plurality of attributes, and a storage unit 
15 for storing, for every element, an element data set containing an assigned element ID and element data in the element 
table and for storing, for every attribute, an anribute data set containing attribute data and the element ID of the element 
to which the attribute is assigned. In the attribute table. 

[001 8] A still further Implementation of the present Invention provides a computer program comprising program code 
for transferring data from a markup document into a relational database by carrying out the steps of creating an element 

^ table for storing data of the plurality of elements, creating an attribute table for storing data of the plurality of attributes, 
storing, in the element table, an element data set containing an element ID for every one of the plurality of elements, 
storing, in the attribute table, an attribute data set for every one of the plurality of attributes, the attribute data set 
containing attribute data and the element ID of the element to which the attribute Is assigned. 
[0019] A program code may be embodied in any form of a computer program product. A computer program product 

25 comprises a medium whbh stores or transports computer readable code, or In which computer readable code may be 
embedded. Some examples of computer program products are CD-ROM or DVD-ROM disks, ROM cards, magnetic 
storage media like floppy disks, magnetic tapes or computer hard drives, servers on a network and signals transrnitted 
over a network representing a computer readable program code. 

[0020] With the present invention the content information of an XML-document can fast and eff k:lently transferred 
30 Into a relational database where search and query operations can be performed much better than on the basis of the 
XML-document Itself. 

[0021] The above-mentioned and other features, utilities and advantages of the invention will become more readily 
apparent from the following detailed descriptton of particular embodiments of the Invention as illustrated In the accom- 
panying drawings. 

35 

BRIEF DESCRIPTION OF THE DRAWIIMGS 
[0022] 

40 Figure 1 is a schematic diagram illustrating the operation of converting an XML-document into a relational database 
according to an embodiment of the present invention. . 

Figure 2 is a schematte illustration of a database model of a further embodiment of the present invention. 
^5 Figure 3 is a flowchart illustrating an embodiment of the method according to the present Inventton. 

Figure 4 Is a flowchart Illustrating a method according to a further embodiment of the present invention. 
Figure 5 is a schematic illustration of a computer system of the present invention. 

50 

DETAILED DESCRIPTION OF PARTriCUUR EMBODIMENTS 

[0023] According to the present invention database tables are created for the following content types of XML elements 
as given in the XML specificatk)n parts of whteh are reproduced In the following. 

55 
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Element 

[39] element : 



10 



IS 



so 



25 



30 



Start -tag 

[40] STag : 



[41] Attribute 



I STm content 



Name (S 
Name Es, AttValue 



I WFC: EIf?ment Type 
[ VC: Element; M^U(i ] 



I WFC: unAqye htt 

SBS£L ] 

[ VC: ftttyjjpute Value 

Type ] 
[ WFC: No Extgyn^l 

Entity References ] 
[ WFC: NO < in 

Att^yjfci^ut^g Values ] 



Content of Elements 

[43] content ::« (filsmsnt I CharData \ Ref^y^n^e I SDSssL 

I £1 I Comment; ) * 



[0024] The XML elements including character data and the attributes Including the-conesponding attribute values 
35 f omi the most important part of the content of the XML-document. According to the present Invention only this content 
is stored in the database whereas fonmatting infomnation or the like is discarded. 

[0025] Figure 1 shows schematfeally the operation of transforming an XML-document 1 00 into a relational database. 
The database comprises two tableS; namely an element table 21 0 and an attribute table 220. For every element of the 
XML-document an element ID and a corresponding data set In the element table Is created. Besides the element ID 
40 the element data set contains element data as for exannple character data in the same line of the table. The second 
table Is the attribute table 220. For every attribute contained in the XML-document 100 a data set corresponding to 
one line of the attribute table 220 is created. The data set contains the element ID of the element In which the attribute 
appears and attribute data like the attrtoute name and attribute value. 

[0026] According to a further embodiment of the present invention additional tables are provided, namely an element 
4s name table 21 1 and an attribute name table 221 as Illustrated In Figure 2. The element table 21 0 contains the element 

ID and. If the element has a parent element, the element ID of this parent element (parent 10). an element name ID, 

an element number for facHitaling sorting operaltons and character data appearing in the element (PCDATA). A further 

element name table 211 is provided containing data sets for every element name appearing In the element table 210. 

Each data set contains the element name ID and the corresponding element name. The element name table 211 
so therefore fonns a lookup table for the element name on the basis of the element name ID. This is advantageous and 

saves memory space If an XML-document contains a large number of elements having the same (and probably a long) 

name. 

[0027] A similar lookup table is provided for the attribute names, namely attribute name table 221 . For every attribute 
name appearing in the XML-document a data set In the attribute name table is produced containing the corresponding 
55 attribute name ID. The attribute table 220 then contains, for every attribute of the XML-document, the element ID of 
the element in which the attribute is located, the attribute name ID and the attribute value. 

[0028] The method of storing data f omri a maricup document, in particular an XML-document In the fonm of a relational 
database is now explained with reference to the flowchart of Figure 3. 
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[0029] First, the element table and the attribute table are created in method steps SI and S2, respectively. In step 
S3 the XML-document Is inputted. Subsequently, in method step S4, an element data set is provided for every element 
of the XML-document containing an assigned element ID and element data. In the following step S5 an attribute data 
set is provided (or every attribute of the XML-<locument containing attribute data and the element 10 of the element In 
5 which the attribute is located. Then the element data sets and the attribute data sets are stored in the element table 
and the attribute table, respectively. The data transfer from the XML-document into a relational database is then fin- 
ished. 

[0030] The method of storing data from an XML-document in a datatiase as illustrated in Figure 2 is now explained 
with reference to the flowchart of Figure 4. An element table, element name table, attribute table and attribute name 

10 table are created in step S1 1 . Then, after inputting the XML-document in step SI 2 an element of the XML-document 
is detected in step S13 and an element 10 is assigned in step S14. An element data set containing element ID, element 
name ID, parent element ID and character data is then stored in the element table (step 815). Then the element name 
data set containing element name and corresponding element name ID is created if the element name stored In step 
S 1 5 has appeared for the first time in this document. 

15 [0031] If the element contains attributes an attribute data set containing the element ID, an attribute name ID and 
the attribute value are stored in the attribute table in method step Si 7. Subsequently, If the attribute name appears for 
the first time in the XML-document an attribute name data set containing anribuie name and the con^esponding attribute 
name ID is stored in the attribute name table (step SI 8). In step S1 9 it is checked whether or not the XML*document 
is finished. If not, the method proceeds to step S20 and proceeds with the next element. If the document Is finished 

so the data inpori to the database is completed. 

[0032] In the following the conversion of data from a mari(up document into a database Is explained using an illus- 
trative example. 

[0033] The XML file (without XML header) is as follows: 
<Example> 

<Elem6nt name="l" attributel="aa" attribute2="ab">A text 
<SubEleTnent attribute="cc"/> 

<SubElement attribute«"dd">Another text</SubElement> 
</Element> 

<Element naTne="2" attributelB"ee" attribute2 = "ef "/> 
ss <Element name«"3" attributel«"gg"/> 

</ExaTnple> 

[0034] This XML file containing in total six elements, six attributes and two text portions is converted into the following 
40 two database tables: 

(1) element table 

[0035] 



SO 



td 


Parentid 


XMLEIementName 


PCDATA 


.1 




"Example* 




2 


1 


"Element- 


■A text" 


3 


2 


"SubElement" 




4 


2 


•SubElemenl" 


■Another text" 


6 


1 


"Bemenr 




6 


1 


•Element" 
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42) attribute table 
t0036] 



to 



XMLEIementId 


XMLAnributeName 


Value 


Z 


"allribuier 


"aa- 


2 


"aUribuleZ" 


"ab* 


3 


"attribute* 


-cc" 


4 


"attribute" 


"dcf 


5 


"atlributer 


"ee* 


5 


"attribute2" 


•ef 


6 


•attribute " 


"gg" 



[0037] if the additionai tables, element name table and attribute name table are also used for database resulting 
from the above XML file looks as follows : 

element name table 

[0038] 



Id 


Name 


1 


"Example" 


2 


"Elomenl" 


3 


•SubElement" 



<2) element table 
f0039] 



S9 



40 



Id 


Parentld 


XMLEtementNameld 


PCDATA 


1 




1 




2 


1 


2 


"A text" 


3 


2 


3 




4 


2 


3 


"Another text" 


5 


1 


2 




6 


1 


2 





(3) attribute name table 
[0040] 



45 



50 



Id 


Name 


1 


"attributel" 


2 


"attributed 


3 


"attribute" 



^4) attribute table 
[0041] 
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XMLEIementId 


XMLAttributeNameld 


Value 


2 


1 


"aa" 
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(continued) 



IP 



XMLEIementId 


XMLAttributeNameld 


Value 


2 


2 


"ab" 


3 


3 




4 


3 


•dd" 


5 


1 


•ee- 


5 


2 




6 


1 


99 



IS 



20 



[0042] The present invention may t>e carried out using any suitable hardware configuratton Involving a personal 
computer, a workstation a portable device or a network of network computer devices. An exarrple is schematicalty 
lliustrated in Figure 5. The computer comprises a main unit 10 including a central processing unit, input/ouiput means 
for connection with a communication network like the internet, a volatile memory etc. The computer system further 
comprises a storage unit 11 for storing the database, a display unit 12 and an input unit 13 like a keyboard, a mouse 
and/or speech processing means. The computer system may t^e connected over a suitable network to other devk:es 
like a mobile cornputer 20. 

(0043] While the invention has been particularly shown with reference to an embodiment thereof, it will be understood 
by those skilled in the art that various other changes in the fonn and details may be made therein without departing 
from the spirit end scope of the invention. 
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Claims 

1 . A method of storing. In the f omn of a relational database, data from a markup document (1 00) containing a plurality 
of elements and attributes the method comprising the steps of: 

creating an element table (210) for storing data of the plurality of elements, 
creating an attribute table (220) for storing data of the plurality of attributes. 

storing, in the element table (210), an element data set containing an element ID for every one of the plurality 
of elements, 

storing, in the attribute table (220), an attribute data set for every one of the plurality of attributes, the attribute 
data set containing attribute data and the element ID of the element to whtoh the attribute is assigned. 

2. The method of claffn 1, wherein an elennent data set contains character data. 

3. The method of claim 1 or 2 wherein an element data set contains a parent element ID. 

40 4. The method of one of claims 1 to 3 wherefti an element data set contains an element number assigned to the 
element. 

5. The nfiethod of one of claims 1 to 4 wherein an element data set contains an element nante. 

45 6. The method of one of claims 1 to 4 comprising the step of creating a further table for storing, for every element 
name of the plurality of elements, a data set containing the element name and a corresponding element name ID. 

•7. The method of claim 1 comprising the step of creating a further table for storing, for every one of the plurality of 
elements, a data set containing the element ID and element character data. 



so 



8. The method of one of claims 1 to 7 wherein an attribute data set contains attribute name and attribute value. 



55 



9. The method of one of claims 1 to 7 wherein an attribute data set contains an attribute name ID, the method com- 
prising the step of creating a further table for storing, for every attribute name, a data set containing the attribute 
name and a corresponding attribute name ID. 



10. The method of any one of claims 1 to 9 wherein the martcup document Is an XML document. 
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11. A data structure comprising: 

an element table (21 0) for storing a plurality of data sets con-espondlng to a plurality of elements of a markup 
document (100), an element data set containing an assigned element ID and element data. 
5 an attribute table (220) for storing a plurality of attribute data sets corresponding to a plurality of attributes of 

a markup document, an attribute data set containing attribute data and the element ID of the element to which 
the attribute Is assigned. 

12. The dala structure of claim 11 , wherein an element data s^ contains character data. 

10 

13. The data structure of claim 11 or 12 wherein an element data set contains a parent element ID. 

14. The data structure of one of daims 11 to 13. wherein an element data set contains an element number assigned 
to the element. 

15 

15. The data structure of one of claims 11 to 14, wherein an element data set contains an element name. 

1 6. The data structure of one of claims 1 1 to 1 4 wherein an element data set comprises an element nanoe ID. the data 
slruclure comprising a further table for storing, for each element name ID. a dala sel containing the element name 

so ID and the corresponding element name. 

17. The data structure of claim 11 comprising a further table for storing a data set containing the element ID and the 
character data of an element. 

2$ 18. The data structure of one of claims 1 1 to 1 7 wherein an attribute data set contains attribute name and attribute value, 

19. The data structure of one of claims 11 to 17 wherein an attribute data set contains an attribute name ID, the data 
structure further comprising, for every attribute name ID. a data set containing the attribute name ID and the cor- 
responding attribute name. 

30 

20. A data sel of one of claims 11 to 19, wherein the mart<up document is an XML document. 

21. A computer program comprising program code for transferring data from a markup document Into a relational 
database by canrying out the steps of: 

35 

creating an element table (21 0) for storing data of the plurality of elements, 
creating an attribute table (220) for storing data of the plurality of attributes. 

storing, in the element table (21 0). an element data set containing an element ID for every one of the plurality 
of elements. 

40 storing, in the attribute table (220), an attribute data set for every one of the plurality of attributes, the attribute 

data set containing attribute data and the element ID of the element to which the attribute Is assigned. 

22. The computer program ot claim 21 wherein an element data set contains a parent element ID. 

43 23. The computer program of claim 21 or 22 wherein an element data set contains an element number assigned to 
the element. 

24. The computer program of one of claims 21 to 23 wherein an element data set contains an element name. 

so 25. The computer program of one of ctainre 21 to 23 comprising program code for creating a further t^le for storing. 

for every element name of the plurality of elements, a data set containing the element name and a corresponding 
element name ID. 

26. The computer program of claim 21 comprising program code for creating a further table for storing, for every one 
55 of the plurality of elements, a data set containing the element ID and element character data. 

27. The computer program of one of claims 21 to 26 wherein an attribute data set contains an attribute name. 
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28. The computer program of one of claims 21 to 26 wherein an attrOHite data set contains an attribute name ID. the 
computer program comprising program code for creating a further table for storing, for every attribute name, a 
data set containing the attribute name and a coaesponding attribute name 10. 

5 29. The computer program of any one of daims 21 to 28 wherein the markup document is an XML document. 

30. A computer program product comprising program code for transferring data from a nutrkup document into a rela- 
tional database by carrying out the steps of: 

10 creating an element table (21 0) for storing data of the plurality of elements, 

creating an attribute table (220) for storing data of the plurality of attributes. 

storing, in the element table (21 0), an element data set containing an element ID for every one of the plurality 
of elements. 

storing, in the attribute table (220), an attribute data set for every one of the plurality of attributes, the attribute 
f5 data set containing attribute data and the element ID of the element to which the attribute is assigned. 

31. A computer system comprising: 

an input untl (10) lor inputting a markup document (100) conleining a plurality of elements and attributes. 
20 a processing unit (10) for creating an element table (210) for storing data of the plurality of elements and 

attribute table (220) for storing data of the plurality of attributes, and 

a storage unit (1 1 ) for storing, for every element, an element data set containing an assigned element ID and 
element data in the element table (210) and for storing, for every attribute, an attribute data set containing 
attribute data and the clement ID of tho cloncnt to which the attribute Is assigned, in the attribute table (220). 
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Creating attribute table 



T 



3V. 



S1 



Creating element table 



Inputting XML document 



•S2 



S3 



Providing element data set for every element 
of XML document and containing assigned 
element ID and element data 



■v. 



S4 



Providing attribute data set for every attribute of 
XML document and containing an attribute name 
and value and element ID of corresponding element 



S5 



Storing element data sets in element table 



Storing attribute data sets in attribute table 



S6 



S7 



END 



Fig- 3 



OOCID: <EP ia255ieA1J.> 



12 



